Overview

Dataset statistics

Number of variables8
Number of observations753
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory47.2 KiB
Average record size in memory64.2 B

Variable types

Numeric6
Categorical2

Alerts

District has a high cardinality: 77 distinct values High cardinality
Local Level Name has a high cardinality: 737 distinct values High cardinality
Total family number is highly correlated with Total household number and 3 other fieldsHigh correlation
Total household number is highly correlated with Total family number and 3 other fieldsHigh correlation
Total population is highly correlated with Total family number and 3 other fieldsHigh correlation
Total Male is highly correlated with Total family number and 3 other fieldsHigh correlation
Total Female is highly correlated with Total family number and 3 other fieldsHigh correlation
Total family number is highly correlated with Total household number and 3 other fieldsHigh correlation
Total household number is highly correlated with Total family number and 3 other fieldsHigh correlation
Total population is highly correlated with Total family number and 3 other fieldsHigh correlation
Total Male is highly correlated with Total family number and 3 other fieldsHigh correlation
Total Female is highly correlated with Total family number and 3 other fieldsHigh correlation
Total family number is highly correlated with Total household number and 3 other fieldsHigh correlation
Total household number is highly correlated with Total family number and 3 other fieldsHigh correlation
Total population is highly correlated with Total family number and 3 other fieldsHigh correlation
Total Male is highly correlated with Total family number and 3 other fieldsHigh correlation
Total Female is highly correlated with Total family number and 3 other fieldsHigh correlation
_id is highly correlated with DistrictHigh correlation
District is highly correlated with _id and 5 other fieldsHigh correlation
Total family number is highly correlated with District and 4 other fieldsHigh correlation
Total household number is highly correlated with District and 4 other fieldsHigh correlation
Total population is highly correlated with District and 4 other fieldsHigh correlation
Total Male is highly correlated with District and 4 other fieldsHigh correlation
Total Female is highly correlated with District and 4 other fieldsHigh correlation
_id is uniformly distributed Uniform
Local Level Name is uniformly distributed Uniform
_id has unique values Unique

Reproduction

Analysis started2022-04-20 13:28:53.519164
Analysis finished2022-04-20 13:29:00.498161
Duration6.98 seconds
Software versionpandas-profiling v3.1.1
Download configurationconfig.json

Variables

_id
Real number (ℝ≥0)

HIGH CORRELATION
UNIFORM
UNIQUE

Distinct753
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean377
Minimum1
Maximum753
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.0 KiB

Quantile statistics

Minimum1
5-th percentile38.6
Q1189
median377
Q3565
95-th percentile715.4
Maximum753
Range752
Interquartile range (IQR)376

Descriptive statistics

Standard deviation217.516666
Coefficient of variation (CV)0.5769672839
Kurtosis-1.2
Mean377
Median Absolute Deviation (MAD)188
Skewness0
Sum283881
Variance47313.5
MonotonicityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11
 
0.1%
5071
 
0.1%
4981
 
0.1%
4991
 
0.1%
5001
 
0.1%
5011
 
0.1%
5021
 
0.1%
5031
 
0.1%
5041
 
0.1%
5051
 
0.1%
Other values (743)743
98.7%
ValueCountFrequency (%)
11
0.1%
21
0.1%
31
0.1%
41
0.1%
51
0.1%
61
0.1%
71
0.1%
81
0.1%
91
0.1%
101
0.1%
ValueCountFrequency (%)
7531
0.1%
7521
0.1%
7511
0.1%
7501
0.1%
7491
0.1%
7481
0.1%
7471
0.1%
7461
0.1%
7451
0.1%
7441
0.1%

District
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct77
Distinct (%)10.2%
Missing0
Missing (%)0.0%
Memory size6.0 KiB
Sarlahi
 
20
Dhanusa
 
18
Rautahat
 
18
Saptari
 
18
Morang
 
17
Other values (72)
662 

Length

Max length16
Median length7
Mean length7.460823373
Min length4

Characters and Unicode

Total characters5618
Distinct characters43
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTaplejung
2nd rowTaplejung
3rd rowTaplejung
4th rowTaplejung
5th rowTaplejung

Common Values

ValueCountFrequency (%)
Sarlahi20
 
2.7%
Dhanusa18
 
2.4%
Rautahat18
 
2.4%
Saptari18
 
2.4%
Morang17
 
2.3%
Siraha17
 
2.3%
Rupandehi16
 
2.1%
Bara16
 
2.1%
Jhapa15
 
2.0%
Mahottari15
 
2.0%
Other values (67)583
77.4%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
sarlahi20
 
2.6%
saptari18
 
2.3%
dhanusa18
 
2.3%
rautahat18
 
2.3%
siraha17
 
2.2%
morang17
 
2.2%
rupandehi16
 
2.1%
bara16
 
2.1%
jhapa15
 
1.9%
mahottari15
 
1.9%
Other values (67)607
78.1%

Most occurring characters

ValueCountFrequency (%)
a1202
21.4%
h448
 
8.0%
u362
 
6.4%
n315
 
5.6%
i291
 
5.2%
r290
 
5.2%
l266
 
4.7%
t254
 
4.5%
p190
 
3.4%
k168
 
3.0%
Other values (33)1832
32.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter4817
85.7%
Uppercase Letter777
 
13.8%
Space Separator24
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a1202
25.0%
h448
 
9.3%
u362
 
7.5%
n315
 
6.5%
i291
 
6.0%
r290
 
6.0%
l266
 
5.5%
t254
 
5.3%
p190
 
3.9%
k168
 
3.5%
Other values (12)1031
21.4%
Uppercase Letter
ValueCountFrequency (%)
S136
17.5%
D101
13.0%
B86
11.1%
K80
10.3%
R66
8.5%
M61
7.9%
P48
 
6.2%
J30
 
3.9%
N27
 
3.5%
T25
 
3.2%
Other values (10)117
15.1%
Space Separator
ValueCountFrequency (%)
24
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin5594
99.6%
Common24
 
0.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
a1202
21.5%
h448
 
8.0%
u362
 
6.5%
n315
 
5.6%
i291
 
5.2%
r290
 
5.2%
l266
 
4.8%
t254
 
4.5%
p190
 
3.4%
k168
 
3.0%
Other values (32)1808
32.3%
Common
ValueCountFrequency (%)
24
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII5618
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a1202
21.4%
h448
 
8.0%
u362
 
6.4%
n315
 
5.6%
i291
 
5.2%
r290
 
5.2%
l266
 
4.7%
t254
 
4.5%
p190
 
3.4%
k168
 
3.0%
Other values (33)1832
32.6%

Local Level Name
Categorical

HIGH CARDINALITY
UNIFORM

Distinct737
Distinct (%)97.9%
Missing0
Missing (%)0.0%
Memory size6.0 KiB
Sunkoshi Rural Municipality
 
3
Tribeni Rural Municipality
 
3
Bishnupur Rural Municipality
 
2
Musikot Municipality
 
2
Mahalaxmi Municipality
 
2
Other values (732)
741 

Length

Max length43
Median length26
Mean length26.02257636
Min length16

Characters and Unicode

Total characters19595
Distinct characters50
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique723 ?
Unique (%)96.0%

Sample

1st rowAathrai Tribeni Rural Municipality
2nd rowMaiwakhola Rural Municipality
3rd rowMeringden Rural Municipality
4th rowMikwakhola Rural Municipality
5th rowPhaktanglung Rural Municipality

Common Values

ValueCountFrequency (%)
Sunkoshi Rural Municipality3
 
0.4%
Tribeni Rural Municipality3
 
0.4%
Bishnupur Rural Municipality2
 
0.3%
Musikot Municipality2
 
0.3%
Mahalaxmi Municipality2
 
0.3%
Likhu Rural Municipality2
 
0.3%
Miklajung Rural Municipality2
 
0.3%
Madi Municipality2
 
0.3%
Malika Rural Municipality2
 
0.3%
Madi Rural Municipality2
 
0.3%
Other values (727)731
97.1%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
municipality736
35.7%
rural461
22.4%
city17
 
0.8%
sub-metropolitian11
 
0.5%
tribeni7
 
0.3%
metropolitian6
 
0.3%
madi4
 
0.2%
rapti3
 
0.1%
bagmati3
 
0.1%
bheri3
 
0.1%
Other values (781)809
39.3%

Most occurring characters

ValueCountFrequency (%)
i2837
14.5%
a2654
13.5%
u1556
 
7.9%
l1424
 
7.3%
1307
 
6.7%
n1177
 
6.0%
t984
 
5.0%
r908
 
4.6%
p879
 
4.5%
M837
 
4.3%
Other values (40)5032
25.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter16199
82.7%
Uppercase Letter2077
 
10.6%
Space Separator1307
 
6.7%
Dash Punctuation12
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i2837
17.5%
a2654
16.4%
u1556
9.6%
l1424
8.8%
n1177
7.3%
t984
 
6.1%
r908
 
5.6%
p879
 
5.4%
y829
 
5.1%
c796
 
4.9%
Other values (15)2155
13.3%
Uppercase Letter
ValueCountFrequency (%)
M837
40.3%
R505
24.3%
B126
 
6.1%
S108
 
5.2%
K80
 
3.9%
C60
 
2.9%
P56
 
2.7%
D53
 
2.6%
T51
 
2.5%
G48
 
2.3%
Other values (13)153
 
7.4%
Space Separator
ValueCountFrequency (%)
1307
100.0%
Dash Punctuation
ValueCountFrequency (%)
-12
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin18276
93.3%
Common1319
 
6.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
i2837
15.5%
a2654
14.5%
u1556
 
8.5%
l1424
 
7.8%
n1177
 
6.4%
t984
 
5.4%
r908
 
5.0%
p879
 
4.8%
M837
 
4.6%
y829
 
4.5%
Other values (38)4191
22.9%
Common
ValueCountFrequency (%)
1307
99.1%
-12
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII19595
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i2837
14.5%
a2654
13.5%
u1556
 
7.9%
l1424
 
7.3%
1307
 
6.7%
n1177
 
6.0%
t984
 
5.0%
r908
 
4.6%
p879
 
4.5%
M837
 
4.3%
Other values (40)5032
25.7%

Total family number
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct736
Distinct (%)97.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8976.723772
Minimum125
Maximum231714
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.0 KiB

Quantile statistics

Minimum125
5-th percentile2159.2
Q14036
median5896
Q39510
95-th percentile23710.2
Maximum231714
Range231589
Interquartile range (IQR)5474

Descriptive statistics

Standard deviation12986.55224
Coefficient of variation (CV)1.446691752
Kurtosis134.0640627
Mean8976.723772
Median Absolute Deviation (MAD)2341
Skewness9.434370137
Sum6759473
Variance168650539
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
53053
 
0.4%
36272
 
0.3%
80342
 
0.3%
40312
 
0.3%
51482
 
0.3%
68122
 
0.3%
31922
 
0.3%
48222
 
0.3%
51952
 
0.3%
41432
 
0.3%
Other values (726)732
97.2%
ValueCountFrequency (%)
1251
0.1%
3201
0.1%
3911
0.1%
4561
0.1%
4591
0.1%
4711
0.1%
5401
0.1%
5431
0.1%
5581
0.1%
6001
0.1%
ValueCountFrequency (%)
2317141
0.1%
1431371
0.1%
982881
0.1%
778721
0.1%
573831
0.1%
516191
0.1%
510991
0.1%
507431
0.1%
472181
0.1%
471691
0.1%

Total household number
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct720
Distinct (%)95.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7493.590969
Minimum125
Maximum105649
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.0 KiB

Quantile statistics

Minimum125
5-th percentile1995.2
Q13653
median5195
Q38151
95-th percentile19439
Maximum105649
Range105524
Interquartile range (IQR)4498

Descriptive statistics

Standard deviation8401.892705
Coefficient of variation (CV)1.121210477
Kurtosis54.11716468
Mean7493.590969
Median Absolute Deviation (MAD)2004
Skewness5.946940988
Sum5642674
Variance70591801.03
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
27352
 
0.3%
63062
 
0.3%
23622
 
0.3%
13642
 
0.3%
22712
 
0.3%
47952
 
0.3%
34212
 
0.3%
36522
 
0.3%
48972
 
0.3%
42312
 
0.3%
Other values (710)733
97.3%
ValueCountFrequency (%)
1251
0.1%
3061
0.1%
3331
0.1%
4411
0.1%
4441
0.1%
4541
0.1%
5111
0.1%
5391
0.1%
5421
0.1%
5481
0.1%
ValueCountFrequency (%)
1056491
0.1%
1016691
0.1%
778381
0.1%
490441
0.1%
452401
0.1%
452041
0.1%
410121
0.1%
402071
0.1%
394251
0.1%
392491
0.1%

Total population
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct747
Distinct (%)99.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38612.20452
Minimum398
Maximum845767
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.0 KiB

Quantile statistics

Minimum398
5-th percentile8931.2
Q116768
median25508
Q344929
95-th percentile100099.4
Maximum845767
Range845369
Interquartile range (IQR)28161

Descriptive statistics

Standard deviation49904.37721
Coefficient of variation (CV)1.292450867
Kurtosis106.3215051
Mean38612.20452
Median Absolute Deviation (MAD)11398
Skewness8.168110664
Sum29074990
Variance2490446865
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
152372
 
0.3%
238912
 
0.3%
187512
 
0.3%
102052
 
0.3%
425282
 
0.3%
139002
 
0.3%
198351
 
0.1%
573411
 
0.1%
231241
 
0.1%
170771
 
0.1%
Other values (737)737
97.9%
ValueCountFrequency (%)
3981
0.1%
12721
0.1%
15351
0.1%
15841
0.1%
16711
0.1%
17131
0.1%
19971
0.1%
24621
0.1%
25811
0.1%
26111
0.1%
ValueCountFrequency (%)
8457671
0.1%
5184521
0.1%
3693771
0.1%
2998431
0.1%
2682731
0.1%
2447501
0.1%
2047881
0.1%
2010791
0.1%
1980981
0.1%
1959511
0.1%

Total Male
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct746
Distinct (%)99.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean18842.81408
Minimum170
Maximum431501
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.0 KiB

Quantile statistics

Minimum170
5-th percentile4448.6
Q18115
median12248
Q322100
95-th percentile48053.4
Maximum431501
Range431331
Interquartile range (IQR)13985

Descriptive statistics

Standard deviation24891.92856
Coefficient of variation (CV)1.321030312
Kurtosis114.4046473
Mean18842.81408
Median Absolute Deviation (MAD)5539
Skewness8.481348417
Sum14188639
Variance619608107.4
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
102142
 
0.3%
121692
 
0.3%
96262
 
0.3%
69202
 
0.3%
61582
 
0.3%
106702
 
0.3%
57132
 
0.3%
343161
 
0.1%
86281
 
0.1%
351801
 
0.1%
Other values (736)736
97.7%
ValueCountFrequency (%)
1701
0.1%
7281
0.1%
7481
0.1%
8181
0.1%
8311
0.1%
8391
0.1%
11321
0.1%
12061
0.1%
12891
0.1%
13301
0.1%
ValueCountFrequency (%)
4315011
0.1%
2509991
0.1%
1797441
0.1%
1507021
0.1%
1398491
0.1%
1227691
0.1%
1023341
0.1%
1004171
0.1%
973241
0.1%
957051
0.1%

Total Female
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct743
Distinct (%)98.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean19769.39044
Minimum228
Maximum414266
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.0 KiB

Quantile statistics

Minimum228
5-th percentile4450.2
Q18553
median13010
Q323068
95-th percentile51533.2
Maximum414266
Range414038
Interquartile range (IQR)14515

Descriptive statistics

Standard deviation25046.36215
Coefficient of variation (CV)1.266926374
Kurtosis98.60091807
Mean19769.39044
Median Absolute Deviation (MAD)5726
Skewness7.861075579
Sum14886351
Variance627320256.8
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
84103
 
0.4%
271592
 
0.3%
147112
 
0.3%
80542
 
0.3%
88162
 
0.3%
84802
 
0.3%
114672
 
0.3%
109102
 
0.3%
84462
 
0.3%
285291
 
0.1%
Other values (733)733
97.3%
ValueCountFrequency (%)
2281
0.1%
5441
0.1%
7661
0.1%
7871
0.1%
8401
0.1%
8651
0.1%
8741
0.1%
12561
0.1%
12751
0.1%
12921
0.1%
ValueCountFrequency (%)
4142661
0.1%
2674531
0.1%
1896331
0.1%
1491411
0.1%
1284241
0.1%
1219811
0.1%
1066951
0.1%
1037901
0.1%
1024541
0.1%
993491
0.1%

Interactions

Correlations

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

_idDistrictLocal Level NameTotal family numberTotal household numberTotal populationTotal MaleTotal Female
01TaplejungAathrai Tribeni Rural Municipality286927351228860056283
12TaplejungMaiwakhola Rural Municipality227521781036552645101
23TaplejungMeringden Rural Municipality268325281204061815859
34TaplejungMikwakhola Rural Municipality18621792799140003991
45TaplejungPhaktanglung Rural Municipality286427001192562395686
56TaplejungPhungling Municipality73065888287861416014626
67TaplejungSidingba Rural Municipality260424841098155935388
78TaplejungSirijangha Rural Municipality332931971418672276959
89TaplejungPathivara Yangwarak Rural Municipality273826371179758555942
910PanchtharFalelung Rural Municipality49404773205311021110320

Last rows

_idDistrictLocal Level NameTotal family numberTotal household numberTotal populationTotal MaleTotal Female
743744BaitadiSurnaya Rural Municipality387732061823085499681
744745DarchulaApihimal Rural Municipality13631256702334383585
745746DarchulaByas Rural Municipality236121681020548845321
746747DarchulaDunhu Rural Municipality22931906991245805332
747748DarchulaLekam Rural Municipality304526451363865207118
748749DarchulaMahakali Municipality60804596245721207212500
749750DarchulaMalikaarjun Rural Municipality324129931575476318123
750751DarchulaMarma Rural Municipality308627041558675518035
751752DarchulaNaugad Rural Municipality315629201643479888446
752753DarchulaShailyashikhar Municipality45614012219321068511247